{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Introducing Keras\n",
    "\n",
    "In the next cell, we introduce [Keras](http://keras.io/), a high-level library for machine learning which we will use for the rest of the class. Keras is built on top of [Tensorflow](https://tensorflow.org/), which is an open-source framework which impelments machine learning methodology, particularly that of deep neural networks, by optimizing the efficiency of the computation. We do not have to deal so much with the details of this. For our purposes, Tensorflow is also a very low-level library which is not necessarily accessible to the typical engineer. Keras solves this by creating a wrapper around Tensorflow, reducing the complexity of coding neural networks, and giving us a set of convenient functions which implement lots of reusable routines. Most importantly, Keras (via Tensorflow) efficiently implement backpropagation to train neural networks on the GPU. Effectively, you could say that Keras is to Tensorflow what [Processing](http://processing.org/) is to Java.\n",
    "\n",
    "To start, we will re-implement what we did in the last section, a neural network to classify the Iris dataset, but this time we will use Keras.\n",
    "\n",
    "Start by importing the relevant Keras libraries that we will be using, as well as matplotlib and numpy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import random\n",
    "\n",
    "import keras\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Dropout"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load the Iris dataset again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_iris\n",
    "\n",
    "iris = load_iris()\n",
    "data, labels = iris.data[:,0:3], iris.data[:,3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last lesson, we manually trained a neural network to predict the sepal width of the Iris flowers. This time, let's use the Keras library instead. First we need to shuffle and pre-process the data. Pre-processing in this case is normalization of the data, as well as converting it to a properly-shaped numpy array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape of X (150, 3)\n",
      "first 5 rows of X\n",
      " [[0.79746836 0.77272725 0.8115942 ]\n",
      " [0.6962025  0.5681818  0.5797101 ]\n",
      " [0.96202534 0.6818182  0.95652175]\n",
      " [0.8101266  0.70454544 0.79710144]\n",
      " [0.5949367  0.72727275 0.23188406]]\n",
      "first 5 labels\n",
      " [0.96 0.52 0.84 0.72 0.08]\n"
     ]
    }
   ],
   "source": [
    "num_samples = len(labels)  # size of our dataset\n",
    "shuffle_order = np.random.permutation(num_samples)\n",
    "data = data[shuffle_order, :]\n",
    "labels = labels[shuffle_order]\n",
    "\n",
    "# normalize data and labels to between 0 and 1 and make sure it's float32\n",
    "data = data / np.amax(data, axis=0)\n",
    "data = data.astype('float32')\n",
    "labels = labels / np.amax(labels, axis=0)\n",
    "labels = labels.astype('float32')\n",
    "\n",
    "# print out the data\n",
    "print(\"shape of X\", data.shape)\n",
    "print(\"first 5 rows of X\\n\", data[0:5, :])\n",
    "print(\"first 5 labels\\n\", labels[0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Overfitting and validation\n",
    "\n",
    "In our previous guides, we always evaluated the performance of the network on the same data that we trained it on. But this is wrong; our network could learn to \"cheat\" by overfitting to the training data (like memorizing it) so as to get a high score, but then not generalize well to actually unknown examples.\n",
    "\n",
    "In machine learning, this is called \"overfitting\" and there are several things we have to do to avoid it. The first thing is we must split our dataset into a \"training set\" which we train on with gradient descent, and a \"test set\" which is hidden from the training process that we can do a final evaluation on to get the true accuracy, that of the network trying to predict unknown samples.\n",
    "\n",
    "Let's split the data into a training set and a test set. We'll keep the first 30% of the dataset to use as a test set, and use the rest for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "105 training samples, 45 test samples\n"
     ]
    }
   ],
   "source": [
    "# let's rename the data and labels to X, y\n",
    "X, y = data, labels\n",
    "\n",
    "test_split = 0.3  # percent split\n",
    "\n",
    "n_test = int(test_split * num_samples)\n",
    "\n",
    "x_train, x_test = X[n_test:, :], X[:n_test, :] \n",
    "y_train, y_test = y[n_test:], y[:n_test] \n",
    "\n",
    "print('%d training samples, %d test samples' % (x_train.shape[0], x_test.shape[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In Keras, to instantiate a neural network model, we use the `Sequential` class. Sequential simply means a model with a sequence of layers which propagate in one direction, from input to output. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have an empty neural network called `model`. Now let's add our first layer, which will be our input layer. We will do this using Keras's `Dense` class which will instantiate our input layer.\n",
    "\n",
    "The reason why it is called \"Dense\" is that the layer is \"fully-connected,\" which means that all of it's neurons are connected to all the neurons in the previous layer, with no empty connections. This may seem confusing at first because we have not yet seen neural network layers which are not fully-connected; we will see this in the next chapter when we introduce convolutional networks. \n",
    "\n",
    "To create a Dense layer, we have two arguments that need to be specified: the number of neurons and the activation function (which non-linearity, if any). For the first layer, we must also specify the input dimension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.add(Dense(8, activation='sigmoid', input_dim=3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also get a readout of the current state of the network using `model.summary`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "dense_16 (Dense)             (None, 8)                 32        \n",
      "=================================================================\n",
      "Total params: 32\n",
      "Trainable params: 32\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our network currently has one layer with 32 parameters: that's 3 neurons in the input layer, times 8 neurons in the middle layer (3x8=24), plus 8 biases (24+8=32).\n",
    "\n",
    "Next, we will add the output layer, which will be a fully-connected (Dense) layer whose size is 1 neuron. This neuron will contain our final output.\n",
    "\n",
    "Notice that this time, instead of having the activation be a sigmoid as before, we leave it as a \"linear\" activation (no non-linearity). This is common for the final output.\n",
    "\n",
    "We add it, and look at the final summary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "dense_16 (Dense)             (None, 8)                 32        \n",
      "_________________________________________________________________\n",
      "dense_17 (Dense)             (None, 1)                 9         \n",
      "=================================================================\n",
      "Total params: 41\n",
      "Trainable params: 41\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "model.add(Dense(1, activation='linear'))\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we've added 9 parameters, 8x1 weights between the hidden and output layers, and 1 bias in the output. So we have 41 parameters in total."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are finished specifying the architecture of the model. Now we need to specify our loss function and optimizer, and then compile the model. Let's discuss each of these things.\n",
    "\n",
    "First, we specify the loss. The standard for regression, as we said before is sum-squared error (SSE) or mean-squared error (MSE). SSE and MSE are basically the same, since the only difference between them is a scaling factor ($\\frac{1}{n}$) which doesn't depend on the final weights. Keras happens to use MSE for evaluation rather than SEE, so we will use that.\n",
    "\n",
    "The optimizer is the flavor of gradient descent we want. The most basic optimizer is \"stochastic gradient descent\" or SGD which is the learning algorithm we have used so far. We have mostly used batch gradient descent so far, which means we compute our gradient over the entire dataset. For reasons which will be more clear when we cover learning algorithms in more detail, this is not usually favored, and we instead calculate the gradient over random subsets of the training data, called mini-batches.\n",
    "\n",
    "Once we've specified our loss function and optimizer, the model is compiled. Compiling means that Keras (actually Tensorflow internally) is allocating memory for a \"computational graph\" whose architecture is that which is specified by your model definition. This is done for optimization purposes, and a full understanding of how that's done is not necessary for this course and is beyond its scope."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.compile(loss='mean_squared_error', optimizer='sgd')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are finally ready to train. In the next cell, we run the `fit` command which will begin the process of training. There are several important arguments to `fit`. The first is the training data and labels (`x_train` and `y_train`), as well as the validation set (`x_test` and `y_test`). \n",
    "\n",
    "Additionally, we must specify the `batch_size` which refers to the number of training samples to calculate the gradient over (using SGD), as well as the number of `epochs`, which refers to the number of times we cycle through the training set. In general, more epochs are usually better, although in practice, the accuracy of the network may stop improving early, which makes it unnecessary to train for too many epochs.\n",
    "\n",
    "Because we have a very small dataset (just 105 samples), we should have a low batch size and can afford to train over many epochs (let's set to 200)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 105 samples, validate on 45 samples\n",
      "Epoch 1/200\n",
      "105/105 [==============================] - 0s 2ms/step - loss: 0.1224 - val_loss: 0.1100\n",
      "Epoch 2/200\n",
      "105/105 [==============================] - 0s 310us/step - loss: 0.0996 - val_loss: 0.1084\n",
      "Epoch 3/200\n",
      "105/105 [==============================] - 0s 406us/step - loss: 0.0991 - val_loss: 0.1059\n",
      "Epoch 4/200\n",
      "105/105 [==============================] - 0s 307us/step - loss: 0.0966 - val_loss: 0.1050\n",
      "Epoch 5/200\n",
      "105/105 [==============================] - 0s 305us/step - loss: 0.0956 - val_loss: 0.1038\n",
      "Epoch 6/200\n",
      "105/105 [==============================] - 0s 323us/step - loss: 0.0945 - val_loss: 0.1023\n",
      "Epoch 7/200\n",
      "105/105 [==============================] - 0s 307us/step - loss: 0.0938 - val_loss: 0.1010\n",
      "Epoch 8/200\n",
      "105/105 [==============================] - 0s 314us/step - loss: 0.0922 - val_loss: 0.1000\n",
      "Epoch 9/200\n",
      "105/105 [==============================] - 0s 324us/step - loss: 0.0908 - val_loss: 0.0990\n",
      "Epoch 10/200\n",
      "105/105 [==============================] - 0s 310us/step - loss: 0.0900 - val_loss: 0.0975\n",
      "Epoch 11/200\n",
      "105/105 [==============================] - 0s 314us/step - loss: 0.0888 - val_loss: 0.0966\n",
      "Epoch 12/200\n",
      "105/105 [==============================] - 0s 416us/step - loss: 0.0880 - val_loss: 0.0957\n",
      "Epoch 13/200\n",
      "105/105 [==============================] - 0s 416us/step - loss: 0.0869 - val_loss: 0.0942\n",
      "Epoch 14/200\n",
      "105/105 [==============================] - 0s 351us/step - loss: 0.0857 - val_loss: 0.0930\n",
      "Epoch 15/200\n",
      "105/105 [==============================] - 0s 327us/step - loss: 0.0850 - val_loss: 0.0919\n",
      "Epoch 16/200\n",
      "105/105 [==============================] - 0s 327us/step - loss: 0.0841 - val_loss: 0.0916\n",
      "Epoch 17/200\n",
      "105/105 [==============================] - 0s 336us/step - loss: 0.0832 - val_loss: 0.0898\n",
      "Epoch 18/200\n",
      "105/105 [==============================] - 0s 335us/step - loss: 0.0818 - val_loss: 0.0891\n",
      "Epoch 19/200\n",
      "105/105 [==============================] - 0s 324us/step - loss: 0.0813 - val_loss: 0.0876\n",
      "Epoch 20/200\n",
      "105/105 [==============================] - 0s 332us/step - loss: 0.0797 - val_loss: 0.0874\n",
      "Epoch 21/200\n",
      "105/105 [==============================] - 0s 334us/step - loss: 0.0796 - val_loss: 0.0863\n",
      "Epoch 22/200\n",
      "105/105 [==============================] - 0s 364us/step - loss: 0.0783 - val_loss: 0.0854\n",
      "Epoch 23/200\n",
      "105/105 [==============================] - 0s 339us/step - loss: 0.0776 - val_loss: 0.0835\n",
      "Epoch 24/200\n",
      "105/105 [==============================] - 0s 360us/step - loss: 0.0761 - val_loss: 0.0825\n",
      "Epoch 25/200\n",
      "105/105 [==============================] - 0s 359us/step - loss: 0.0753 - val_loss: 0.0816\n",
      "Epoch 26/200\n",
      "105/105 [==============================] - 0s 340us/step - loss: 0.0741 - val_loss: 0.0810\n",
      "Epoch 27/200\n",
      "105/105 [==============================] - 0s 322us/step - loss: 0.0734 - val_loss: 0.0796\n",
      "Epoch 28/200\n",
      "105/105 [==============================] - 0s 364us/step - loss: 0.0725 - val_loss: 0.0787\n",
      "Epoch 29/200\n",
      "105/105 [==============================] - 0s 330us/step - loss: 0.0715 - val_loss: 0.0778\n",
      "Epoch 30/200\n",
      "105/105 [==============================] - 0s 339us/step - loss: 0.0712 - val_loss: 0.0768\n",
      "Epoch 31/200\n",
      "105/105 [==============================] - 0s 355us/step - loss: 0.0698 - val_loss: 0.0759\n",
      "Epoch 32/200\n",
      "105/105 [==============================] - 0s 333us/step - loss: 0.0693 - val_loss: 0.0752\n",
      "Epoch 33/200\n",
      "105/105 [==============================] - 0s 341us/step - loss: 0.0683 - val_loss: 0.0743\n",
      "Epoch 34/200\n",
      "105/105 [==============================] - 0s 349us/step - loss: 0.0674 - val_loss: 0.0731\n",
      "Epoch 35/200\n",
      "105/105 [==============================] - 0s 334us/step - loss: 0.0665 - val_loss: 0.0722\n",
      "Epoch 36/200\n",
      "105/105 [==============================] - 0s 350us/step - loss: 0.0655 - val_loss: 0.0714\n",
      "Epoch 37/200\n",
      "105/105 [==============================] - 0s 339us/step - loss: 0.0650 - val_loss: 0.0712\n",
      "Epoch 38/200\n",
      "105/105 [==============================] - 0s 362us/step - loss: 0.0641 - val_loss: 0.0698\n",
      "Epoch 39/200\n",
      "105/105 [==============================] - 0s 381us/step - loss: 0.0631 - val_loss: 0.0688\n",
      "Epoch 40/200\n",
      "105/105 [==============================] - 0s 414us/step - loss: 0.0627 - val_loss: 0.0679\n",
      "Epoch 41/200\n",
      "105/105 [==============================] - 0s 332us/step - loss: 0.0616 - val_loss: 0.0671\n",
      "Epoch 42/200\n",
      "105/105 [==============================] - 0s 336us/step - loss: 0.0611 - val_loss: 0.0665\n",
      "Epoch 43/200\n",
      "105/105 [==============================] - 0s 350us/step - loss: 0.0601 - val_loss: 0.0654\n",
      "Epoch 44/200\n",
      "105/105 [==============================] - 0s 397us/step - loss: 0.0596 - val_loss: 0.0646\n",
      "Epoch 45/200\n",
      "105/105 [==============================] - 0s 404us/step - loss: 0.0586 - val_loss: 0.0638\n",
      "Epoch 46/200\n",
      "105/105 [==============================] - 0s 375us/step - loss: 0.0582 - val_loss: 0.0635\n",
      "Epoch 47/200\n",
      "105/105 [==============================] - 0s 348us/step - loss: 0.0577 - val_loss: 0.0621\n",
      "Epoch 48/200\n",
      "105/105 [==============================] - 0s 333us/step - loss: 0.0568 - val_loss: 0.0614\n",
      "Epoch 49/200\n",
      "105/105 [==============================] - 0s 347us/step - loss: 0.0556 - val_loss: 0.0610\n",
      "Epoch 50/200\n",
      "105/105 [==============================] - 0s 337us/step - loss: 0.0547 - val_loss: 0.0603\n",
      "Epoch 51/200\n",
      "105/105 [==============================] - 0s 373us/step - loss: 0.0545 - val_loss: 0.0592\n",
      "Epoch 52/200\n",
      "105/105 [==============================] - 0s 350us/step - loss: 0.0536 - val_loss: 0.0581\n",
      "Epoch 53/200\n",
      "105/105 [==============================] - 0s 338us/step - loss: 0.0529 - val_loss: 0.0574\n",
      "Epoch 54/200\n",
      "105/105 [==============================] - 0s 345us/step - loss: 0.0520 - val_loss: 0.0574\n",
      "Epoch 55/200\n",
      "105/105 [==============================] - 0s 334us/step - loss: 0.0518 - val_loss: 0.0560\n",
      "Epoch 56/200\n",
      "105/105 [==============================] - 0s 330us/step - loss: 0.0508 - val_loss: 0.0551\n",
      "Epoch 57/200\n",
      "105/105 [==============================] - 0s 340us/step - loss: 0.0499 - val_loss: 0.0547\n",
      "Epoch 58/200\n",
      "105/105 [==============================] - 0s 351us/step - loss: 0.0495 - val_loss: 0.0538\n",
      "Epoch 59/200\n",
      "105/105 [==============================] - 0s 341us/step - loss: 0.0487 - val_loss: 0.0529\n",
      "Epoch 60/200\n",
      "105/105 [==============================] - 0s 335us/step - loss: 0.0475 - val_loss: 0.0527\n",
      "Epoch 61/200\n",
      "105/105 [==============================] - 0s 346us/step - loss: 0.0475 - val_loss: 0.0518\n",
      "Epoch 62/200\n",
      "105/105 [==============================] - 0s 317us/step - loss: 0.0467 - val_loss: 0.0508\n",
      "Epoch 63/200\n",
      "105/105 [==============================] - 0s 323us/step - loss: 0.0460 - val_loss: 0.0509\n",
      "Epoch 64/200\n",
      "105/105 [==============================] - 0s 312us/step - loss: 0.0458 - val_loss: 0.0494\n",
      "Epoch 65/200\n",
      "105/105 [==============================] - 0s 316us/step - loss: 0.0447 - val_loss: 0.0487\n",
      "Epoch 66/200\n",
      "105/105 [==============================] - 0s 310us/step - loss: 0.0442 - val_loss: 0.0482\n",
      "Epoch 67/200\n",
      "105/105 [==============================] - 0s 339us/step - loss: 0.0435 - val_loss: 0.0478\n",
      "Epoch 68/200\n",
      "105/105 [==============================] - 0s 376us/step - loss: 0.0435 - val_loss: 0.0468\n",
      "Epoch 69/200\n",
      "105/105 [==============================] - 0s 329us/step - loss: 0.0424 - val_loss: 0.0462\n",
      "Epoch 70/200\n",
      "105/105 [==============================] - 0s 321us/step - loss: 0.0417 - val_loss: 0.0454\n",
      "Epoch 71/200\n",
      "105/105 [==============================] - 0s 413us/step - loss: 0.0410 - val_loss: 0.0451\n",
      "Epoch 72/200\n",
      "105/105 [==============================] - 0s 308us/step - loss: 0.0406 - val_loss: 0.0441\n",
      "Epoch 73/200\n",
      "105/105 [==============================] - 0s 343us/step - loss: 0.0400 - val_loss: 0.0435\n",
      "Epoch 74/200\n",
      "105/105 [==============================] - 0s 327us/step - loss: 0.0390 - val_loss: 0.0429\n",
      "Epoch 75/200\n",
      "105/105 [==============================] - 0s 354us/step - loss: 0.0386 - val_loss: 0.0422\n",
      "Epoch 76/200\n",
      "105/105 [==============================] - 0s 329us/step - loss: 0.0382 - val_loss: 0.0417\n",
      "Epoch 77/200\n",
      "105/105 [==============================] - 0s 325us/step - loss: 0.0375 - val_loss: 0.0410\n",
      "Epoch 78/200\n",
      "105/105 [==============================] - 0s 321us/step - loss: 0.0370 - val_loss: 0.0404\n",
      "Epoch 79/200\n",
      "105/105 [==============================] - 0s 338us/step - loss: 0.0362 - val_loss: 0.0398\n",
      "Epoch 80/200\n",
      "105/105 [==============================] - 0s 322us/step - loss: 0.0358 - val_loss: 0.0394\n",
      "Epoch 81/200\n",
      "105/105 [==============================] - 0s 327us/step - loss: 0.0354 - val_loss: 0.0386\n",
      "Epoch 82/200\n",
      "105/105 [==============================] - 0s 336us/step - loss: 0.0348 - val_loss: 0.0380\n",
      "Epoch 83/200\n",
      "105/105 [==============================] - 0s 341us/step - loss: 0.0343 - val_loss: 0.0378\n",
      "Epoch 84/200\n",
      "105/105 [==============================] - 0s 325us/step - loss: 0.0338 - val_loss: 0.0369\n",
      "Epoch 85/200\n",
      "105/105 [==============================] - 0s 344us/step - loss: 0.0334 - val_loss: 0.0364\n",
      "Epoch 86/200\n",
      "105/105 [==============================] - 0s 331us/step - loss: 0.0328 - val_loss: 0.0361\n",
      "Epoch 87/200\n",
      "105/105 [==============================] - 0s 347us/step - loss: 0.0323 - val_loss: 0.0356\n",
      "Epoch 88/200\n",
      "105/105 [==============================] - 0s 361us/step - loss: 0.0319 - val_loss: 0.0348\n",
      "Epoch 89/200\n",
      "105/105 [==============================] - 0s 335us/step - loss: 0.0315 - val_loss: 0.0342\n",
      "Epoch 90/200\n",
      "105/105 [==============================] - 0s 353us/step - loss: 0.0309 - val_loss: 0.0337\n",
      "Epoch 91/200\n",
      "105/105 [==============================] - 0s 329us/step - loss: 0.0305 - val_loss: 0.0332\n",
      "Epoch 92/200\n",
      "105/105 [==============================] - 0s 351us/step - loss: 0.0304 - val_loss: 0.0326\n",
      "Epoch 93/200\n",
      "105/105 [==============================] - 0s 313us/step - loss: 0.0294 - val_loss: 0.0322\n",
      "Epoch 94/200\n",
      "105/105 [==============================] - 0s 318us/step - loss: 0.0290 - val_loss: 0.0317\n",
      "Epoch 95/200\n",
      "105/105 [==============================] - 0s 350us/step - loss: 0.0287 - val_loss: 0.0312\n",
      "Epoch 96/200\n",
      "105/105 [==============================] - 0s 445us/step - loss: 0.0282 - val_loss: 0.0307\n",
      "Epoch 97/200\n",
      "105/105 [==============================] - 0s 337us/step - loss: 0.0277 - val_loss: 0.0303\n",
      "Epoch 98/200\n",
      "105/105 [==============================] - 0s 338us/step - loss: 0.0272 - val_loss: 0.0299\n",
      "Epoch 99/200\n",
      "105/105 [==============================] - 0s 334us/step - loss: 0.0270 - val_loss: 0.0293\n",
      "Epoch 100/200\n",
      "105/105 [==============================] - 0s 309us/step - loss: 0.0261 - val_loss: 0.0291\n",
      "Epoch 101/200\n",
      "105/105 [==============================] - 0s 336us/step - loss: 0.0260 - val_loss: 0.0287\n",
      "Epoch 102/200\n",
      "105/105 [==============================] - 0s 339us/step - loss: 0.0258 - val_loss: 0.0280\n",
      "Epoch 103/200\n",
      "105/105 [==============================] - 0s 324us/step - loss: 0.0253 - val_loss: 0.0277\n",
      "Epoch 104/200\n",
      "105/105 [==============================] - 0s 335us/step - loss: 0.0250 - val_loss: 0.0279\n",
      "Epoch 105/200\n",
      "105/105 [==============================] - 0s 387us/step - loss: 0.0247 - val_loss: 0.0268\n",
      "Epoch 106/200\n",
      "105/105 [==============================] - 0s 330us/step - loss: 0.0241 - val_loss: 0.0264\n",
      "Epoch 107/200\n",
      "105/105 [==============================] - 0s 311us/step - loss: 0.0237 - val_loss: 0.0260\n",
      "Epoch 108/200\n",
      "105/105 [==============================] - 0s 299us/step - loss: 0.0235 - val_loss: 0.0256\n",
      "Epoch 109/200\n",
      "105/105 [==============================] - 0s 349us/step - loss: 0.0230 - val_loss: 0.0252\n",
      "Epoch 110/200\n",
      "105/105 [==============================] - 0s 340us/step - loss: 0.0228 - val_loss: 0.0248\n",
      "Epoch 111/200\n",
      "105/105 [==============================] - 0s 326us/step - loss: 0.0224 - val_loss: 0.0244\n",
      "Epoch 112/200\n",
      "105/105 [==============================] - 0s 378us/step - loss: 0.0221 - val_loss: 0.0242\n",
      "Epoch 113/200\n",
      "105/105 [==============================] - 0s 316us/step - loss: 0.0218 - val_loss: 0.0242\n",
      "Epoch 114/200\n",
      "105/105 [==============================] - 0s 316us/step - loss: 0.0214 - val_loss: 0.0235\n",
      "Epoch 115/200\n",
      "105/105 [==============================] - 0s 312us/step - loss: 0.0212 - val_loss: 0.0229\n",
      "Epoch 116/200\n",
      "105/105 [==============================] - 0s 313us/step - loss: 0.0207 - val_loss: 0.0229\n",
      "Epoch 117/200\n",
      "105/105 [==============================] - 0s 304us/step - loss: 0.0204 - val_loss: 0.0222\n",
      "Epoch 118/200\n",
      "105/105 [==============================] - 0s 333us/step - loss: 0.0202 - val_loss: 0.0219\n",
      "Epoch 119/200\n",
      "105/105 [==============================] - 0s 406us/step - loss: 0.0197 - val_loss: 0.0219\n",
      "Epoch 120/200\n",
      "105/105 [==============================] - 0s 416us/step - loss: 0.0197 - val_loss: 0.0213\n",
      "Epoch 121/200\n",
      "105/105 [==============================] - 0s 374us/step - loss: 0.0192 - val_loss: 0.0209\n",
      "Epoch 122/200\n",
      "105/105 [==============================] - 0s 362us/step - loss: 0.0191 - val_loss: 0.0207\n",
      "Epoch 123/200\n",
      "105/105 [==============================] - 0s 338us/step - loss: 0.0189 - val_loss: 0.0203\n",
      "Epoch 124/200\n",
      "105/105 [==============================] - 0s 345us/step - loss: 0.0185 - val_loss: 0.0200\n",
      "Epoch 125/200\n",
      "105/105 [==============================] - 0s 352us/step - loss: 0.0183 - val_loss: 0.0198\n",
      "Epoch 126/200\n",
      "105/105 [==============================] - 0s 360us/step - loss: 0.0178 - val_loss: 0.0194\n",
      "Epoch 127/200\n",
      "105/105 [==============================] - 0s 339us/step - loss: 0.0177 - val_loss: 0.0192\n",
      "Epoch 128/200\n",
      "105/105 [==============================] - 0s 330us/step - loss: 0.0174 - val_loss: 0.0190\n",
      "Epoch 129/200\n",
      "105/105 [==============================] - 0s 333us/step - loss: 0.0171 - val_loss: 0.0186\n",
      "Epoch 130/200\n",
      "105/105 [==============================] - 0s 337us/step - loss: 0.0170 - val_loss: 0.0184\n",
      "Epoch 131/200\n",
      "105/105 [==============================] - 0s 353us/step - loss: 0.0166 - val_loss: 0.0181\n",
      "Epoch 132/200\n",
      "105/105 [==============================] - 0s 349us/step - loss: 0.0165 - val_loss: 0.0178\n",
      "Epoch 133/200\n",
      "105/105 [==============================] - 0s 360us/step - loss: 0.0161 - val_loss: 0.0176\n",
      "Epoch 134/200\n",
      "105/105 [==============================] - 0s 332us/step - loss: 0.0160 - val_loss: 0.0175\n",
      "Epoch 135/200\n",
      "105/105 [==============================] - 0s 307us/step - loss: 0.0158 - val_loss: 0.0171\n",
      "Epoch 136/200\n",
      "105/105 [==============================] - 0s 328us/step - loss: 0.0154 - val_loss: 0.0171\n",
      "Epoch 137/200\n",
      "105/105 [==============================] - 0s 325us/step - loss: 0.0152 - val_loss: 0.0166\n",
      "Epoch 138/200\n",
      "105/105 [==============================] - 0s 357us/step - loss: 0.0151 - val_loss: 0.0165\n",
      "Epoch 139/200\n",
      "105/105 [==============================] - 0s 363us/step - loss: 0.0149 - val_loss: 0.0163\n",
      "Epoch 140/200\n",
      "105/105 [==============================] - 0s 325us/step - loss: 0.0147 - val_loss: 0.0166\n",
      "Epoch 141/200\n",
      "105/105 [==============================] - 0s 336us/step - loss: 0.0146 - val_loss: 0.0168\n",
      "Epoch 142/200\n",
      "105/105 [==============================] - 0s 328us/step - loss: 0.0147 - val_loss: 0.0160\n",
      "Epoch 143/200\n",
      "105/105 [==============================] - 0s 336us/step - loss: 0.0144 - val_loss: 0.0154\n",
      "Epoch 144/200\n",
      "105/105 [==============================] - 0s 339us/step - loss: 0.0140 - val_loss: 0.0152\n",
      "Epoch 145/200\n",
      "105/105 [==============================] - 0s 326us/step - loss: 0.0138 - val_loss: 0.0151\n",
      "Epoch 146/200\n",
      "105/105 [==============================] - 0s 316us/step - loss: 0.0137 - val_loss: 0.0154\n",
      "Epoch 147/200\n",
      "105/105 [==============================] - 0s 318us/step - loss: 0.0136 - val_loss: 0.0148\n",
      "Epoch 148/200\n",
      "105/105 [==============================] - 0s 309us/step - loss: 0.0133 - val_loss: 0.0152\n",
      "Epoch 149/200\n",
      "105/105 [==============================] - 0s 305us/step - loss: 0.0132 - val_loss: 0.0145\n",
      "Epoch 150/200\n",
      "105/105 [==============================] - 0s 304us/step - loss: 0.0130 - val_loss: 0.0145\n",
      "Epoch 151/200\n",
      "105/105 [==============================] - 0s 323us/step - loss: 0.0128 - val_loss: 0.0143\n",
      "Epoch 152/200\n",
      "105/105 [==============================] - 0s 352us/step - loss: 0.0128 - val_loss: 0.0142\n",
      "Epoch 153/200\n",
      "105/105 [==============================] - 0s 307us/step - loss: 0.0125 - val_loss: 0.0136\n",
      "Epoch 154/200\n",
      "105/105 [==============================] - 0s 312us/step - loss: 0.0124 - val_loss: 0.0134\n",
      "Epoch 155/200\n",
      "105/105 [==============================] - 0s 300us/step - loss: 0.0123 - val_loss: 0.0133\n",
      "Epoch 156/200\n",
      "105/105 [==============================] - 0s 314us/step - loss: 0.0122 - val_loss: 0.0133\n",
      "Epoch 157/200\n",
      "105/105 [==============================] - 0s 315us/step - loss: 0.0120 - val_loss: 0.0129\n",
      "Epoch 158/200\n",
      "105/105 [==============================] - 0s 303us/step - loss: 0.0119 - val_loss: 0.0132\n",
      "Epoch 159/200\n",
      "105/105 [==============================] - 0s 313us/step - loss: 0.0118 - val_loss: 0.0127\n",
      "Epoch 160/200\n",
      "105/105 [==============================] - 0s 317us/step - loss: 0.0117 - val_loss: 0.0126\n",
      "Epoch 161/200\n",
      "105/105 [==============================] - 0s 321us/step - loss: 0.0116 - val_loss: 0.0131\n",
      "Epoch 162/200\n",
      "105/105 [==============================] - 0s 302us/step - loss: 0.0115 - val_loss: 0.0127\n",
      "Epoch 163/200\n",
      "105/105 [==============================] - 0s 307us/step - loss: 0.0113 - val_loss: 0.0122\n",
      "Epoch 164/200\n",
      "105/105 [==============================] - 0s 319us/step - loss: 0.0112 - val_loss: 0.0120\n",
      "Epoch 165/200\n",
      "105/105 [==============================] - 0s 311us/step - loss: 0.0111 - val_loss: 0.0120\n",
      "Epoch 166/200\n",
      "105/105 [==============================] - 0s 304us/step - loss: 0.0110 - val_loss: 0.0118\n",
      "Epoch 167/200\n",
      "105/105 [==============================] - 0s 329us/step - loss: 0.0108 - val_loss: 0.0116\n",
      "Epoch 168/200\n",
      "105/105 [==============================] - 0s 305us/step - loss: 0.0108 - val_loss: 0.0116\n",
      "Epoch 169/200\n",
      "105/105 [==============================] - 0s 310us/step - loss: 0.0107 - val_loss: 0.0118\n",
      "Epoch 170/200\n",
      "105/105 [==============================] - 0s 324us/step - loss: 0.0107 - val_loss: 0.0114\n",
      "Epoch 171/200\n",
      "105/105 [==============================] - 0s 308us/step - loss: 0.0106 - val_loss: 0.0112\n",
      "Epoch 172/200\n",
      "105/105 [==============================] - 0s 308us/step - loss: 0.0105 - val_loss: 0.0111\n",
      "Epoch 173/200\n",
      "105/105 [==============================] - 0s 314us/step - loss: 0.0104 - val_loss: 0.0111\n",
      "Epoch 174/200\n",
      "105/105 [==============================] - 0s 309us/step - loss: 0.0103 - val_loss: 0.0111\n",
      "Epoch 175/200\n",
      "105/105 [==============================] - 0s 314us/step - loss: 0.0102 - val_loss: 0.0110\n",
      "Epoch 176/200\n",
      "105/105 [==============================] - 0s 309us/step - loss: 0.0102 - val_loss: 0.0109\n",
      "Epoch 177/200\n",
      "105/105 [==============================] - 0s 313us/step - loss: 0.0101 - val_loss: 0.0108\n",
      "Epoch 178/200\n",
      "105/105 [==============================] - 0s 314us/step - loss: 0.0100 - val_loss: 0.0112\n",
      "Epoch 179/200\n",
      "105/105 [==============================] - 0s 302us/step - loss: 0.0100 - val_loss: 0.0107\n",
      "Epoch 180/200\n",
      "105/105 [==============================] - 0s 316us/step - loss: 0.0098 - val_loss: 0.0104\n",
      "Epoch 181/200\n",
      "105/105 [==============================] - 0s 315us/step - loss: 0.0098 - val_loss: 0.0107\n",
      "Epoch 182/200\n",
      "105/105 [==============================] - 0s 310us/step - loss: 0.0097 - val_loss: 0.0103\n",
      "Epoch 183/200\n",
      "105/105 [==============================] - 0s 317us/step - loss: 0.0096 - val_loss: 0.0104\n",
      "Epoch 184/200\n",
      "105/105 [==============================] - 0s 331us/step - loss: 0.0095 - val_loss: 0.0101\n",
      "Epoch 185/200\n",
      "105/105 [==============================] - 0s 299us/step - loss: 0.0094 - val_loss: 0.0104\n",
      "Epoch 186/200\n",
      "105/105 [==============================] - 0s 301us/step - loss: 0.0094 - val_loss: 0.0100\n",
      "Epoch 187/200\n",
      "105/105 [==============================] - 0s 328us/step - loss: 0.0094 - val_loss: 0.0102\n",
      "Epoch 188/200\n",
      "105/105 [==============================] - 0s 306us/step - loss: 0.0093 - val_loss: 0.0100\n",
      "Epoch 189/200\n",
      "105/105 [==============================] - 0s 302us/step - loss: 0.0093 - val_loss: 0.0099\n",
      "Epoch 190/200\n",
      "105/105 [==============================] - 0s 322us/step - loss: 0.0092 - val_loss: 0.0097\n",
      "Epoch 191/200\n",
      "105/105 [==============================] - 0s 315us/step - loss: 0.0092 - val_loss: 0.0097\n",
      "Epoch 192/200\n",
      "105/105 [==============================] - 0s 303us/step - loss: 0.0092 - val_loss: 0.0097\n",
      "Epoch 193/200\n",
      "105/105 [==============================] - 0s 307us/step - loss: 0.0091 - val_loss: 0.0098\n",
      "Epoch 194/200\n",
      "105/105 [==============================] - 0s 352us/step - loss: 0.0090 - val_loss: 0.0096\n",
      "Epoch 195/200\n",
      "105/105 [==============================] - 0s 313us/step - loss: 0.0089 - val_loss: 0.0100\n",
      "Epoch 196/200\n",
      "105/105 [==============================] - 0s 359us/step - loss: 0.0090 - val_loss: 0.0103\n",
      "Epoch 197/200\n",
      "105/105 [==============================] - 0s 341us/step - loss: 0.0089 - val_loss: 0.0096\n",
      "Epoch 198/200\n",
      "105/105 [==============================] - 0s 322us/step - loss: 0.0088 - val_loss: 0.0094\n",
      "Epoch 199/200\n",
      "105/105 [==============================] - 0s 318us/step - loss: 0.0088 - val_loss: 0.0093\n",
      "Epoch 200/200\n",
      "105/105 [==============================] - 0s 295us/step - loss: 0.0088 - val_loss: 0.0092\n"
     ]
    }
   ],
   "source": [
    "history = model.fit(x_train, y_train,\n",
    "                    batch_size=4,\n",
    "                    epochs=200,\n",
    "                    verbose=1,\n",
    "                    validation_data=(x_test, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see above, we train our network down to a validation MSE < 0.01. Notice that both the training loss (\"loss\") and validation loss (\"val_loss\") are reported. It's normal for the training loss to be lower than the validation loss, since the network's objective is to predict the training data well. But if the training loss is much lower than our validation loss, it means we are overfitting and may not expect to receive very good results.\n",
    "\n",
    "We can evaluate the training set one last time at the end using `evaluate`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "45/45 [==============================] - 0s 97us/step\n",
      "Test loss: 0.00922720053543647\n"
     ]
    }
   ],
   "source": [
    "score = model.evaluate(x_test, y_test)\n",
    "print('Test loss:', score)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get the raw predictions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "predicted 0.72, actual 0.96\n",
      "predicted 0.53, actual 0.52\n",
      "predicted 0.87, actual 0.84\n",
      "predicted 0.72, actual 0.72\n",
      "predicted 0.16, actual 0.08\n",
      "predicted 0.13, actual 0.08\n",
      "predicted 0.13, actual 0.08\n",
      "predicted 0.15, actual 0.08\n",
      "predicted 0.62, actual 0.60\n",
      "predicted 0.54, actual 0.52\n"
     ]
    }
   ],
   "source": [
    "y_pred = model.predict(x_test)\n",
    "\n",
    "for yp, ya in list(zip(y_pred, y_test))[0:10]:\n",
    "    print(\"predicted %0.2f, actual %0.2f\" % (yp, ya))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can manually calculate MSE as a sanity check:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MSE is 0.0092\n"
     ]
    }
   ],
   "source": [
    "def MSE(y_pred, y_test):\n",
    "    return (1.0/len(y_test)) * np.sum([((y1[0]-y2)**2) for y1, y2 in list(zip(y_pred, y_test))])\n",
    "\n",
    "print(\"MSE is %0.4f\" % MSE(y_pred, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also predict the value of a single unknown example or a set of them in th following way:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "predicted 0.723, actual 0.960\n"
     ]
    }
   ],
   "source": [
    "x_sample = x_test[0].reshape(1, 3)   # shape must be (num_samples, 3), even if num_samples = 1\n",
    "y_prob = model.predict(x_sample)\n",
    "\n",
    "print(\"predicted %0.3f, actual %0.3f\" % (y_prob[0][0], y_test[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've now finished introducing Keras for regression. Note it is a far more powerful way of training neural networks than our own. Keras's strengths will become even more apparent when we introduce classification in the next lesson, as well as introduce convolutional networks and various other optimization tricks it enables for us."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
